Overlapped Speech Detection and speaker counting using distant microphone arrays

نویسندگان

چکیده

We study the problem of detecting and counting simultaneous, overlapping speakers in a multichannel, distant-microphone scenario. Focusing on supervised learning approach, we treat Voice Activity Detection (VAD), Overlapped Speech (OSD), joint VAD OSD (VAD+OSD) speaker unified way, as instances general Counting (OSDC) multi-class problem. consider Temporal Convolutional Network (TCN) Transformer based architecture for this task, compare them with previously proposed state-of-the art methods Recurrent Neural Networks (RNN) or hybrid Convolutional-Recurrent (CRNN). In addition, propose ways exploiting multichannel input by means early late fusion single-channel features spatial extracted from one more microphone pairs. conduct an extensive experimental evaluation AMI CHiME-6 datasets purposely made synthetic dataset. show that Transformer-based performs best among all architectures neural network localization outperform signal-based significantly improve performance compared to only. Finally, find training objective improves VAD+OSD objective.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distant Speech Recognition Using Microphone Arrays

Speech is the most natural mode of communication and distant speech recognition enables us to communicate conveniently with other devices without any body or head mounted microphones. But the real world deployment of such systems comes with a lot of challenges. This work seeks to address the two major challenges in such a system namely noise and reverberation by using microphone arrays. In this...

متن کامل

Robust speaker recognition using microphone arrays

This paper investigates the use of microphone arrays in handsfree speaker recognition systems. Hands-free operation is preferable in many potential speaker recognition applications, however obtaining acceptable performance with a single distant microphone is problematic in real noise conditions. A possible solution to this problem is the use of microphone arrays, which have the capacity to enha...

متن کامل

Speech Detection and Enhancement Using Single Microphone for Distant Speech Applications in Reverberant Environments

It is well known that in reverberant environments, the human auditory system has the ability to pre-process reverberant signals to compensate for reflections and obtain effective cues for improved recognition. In this study, we propose such a preprocessing technique for combined detection and enhancement of speech using a single microphone in reverberant environments for distant speech applicat...

متن کامل

Speech Recognition Using Ad-hoc Microphone Arrays

While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the...

متن کامل

Improved Overlapped Speech Handling for Speaker Diarization

We present our ongoing work in addressing the issue of overlapped speech in speaker diarization through the use of overlap segmentation, overlapped speech exclusion, and overlap segment labeling. Using feature analysis, we identify the most salient features from a candidate list including those from our previous system and a set of newly proposed features. In addition, through independent optim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computer Speech & Language

سال: 2022

ISSN: ['1095-8363', '0885-2308']

DOI: https://doi.org/10.1016/j.csl.2021.101306